2023
A Survey of Techniques for Optimizing Transformer Inference
TransformersInference OptimizationPruningQuantizationKnowledge DistillationNeural Architecture SearchHardware Acceleration
Survey of transformer inference optimization: pruning, quantization, knowledge distillation, neural architecture search, and hardware acceleration.
